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ABSTRACT 


it  i»  shown  that  confidence  regions  constructed  by  the  repeated-sampling 
principle  are  asymptotically  valid  for  sequential  designs  in  general  linear 
models  and  nonlinear  parameters.  The  related  questions  of  consistency  of 
parameter  estimators  and  convergence  of  sequential  design  to  an  optimal  design 
are  answered  positively.  An  empirical  finding  of  Ford  and  Silvey  (1980)  is 
given  a  theoretical  justification. 
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AMS  (MOS)  Subject  Classifications:  62L05,  62M10 
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Work  Unit  Number  4  -  Statistics  and  Probability 


SIGNIFICANCE  AND  EXPLANATION 


For  estimation  of  parameters  in  nonlinear  models  or  nonlinear  parameters 
in  linear  models#  sequential  design  of  experiment  is  often  used  to  best 
utilize  the  information.  It  results  in  saving  the  number  of  runs.  After  the 
termination  of  the  experiment  with  a  fixed  sample  size,  inference  (such  as 
hypothesis  testing  or  confidence  interval)  about  the  parameter  is  made.  The 
classical  repeated-sampling  principal  of  inference  can  not  be  applied  because 
it  relies  on  the  repetition  of  the  same  design  while  in  the  sequential  setting 
it  is  not  repeatable.  By  using  the  martingale  as  a  technical  tool,  it  is 
shown  that,  at  least  for  large  samples,  such  inference  is  still  justified. 

The  companion  questions  of  consistency  of  parameter  estimators  and  convergence 


of  sequential  design  to  an  optimal  design  are  also  answered.  r-r-v/*-  » 
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Ihe  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 
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ASYMPTOTIC  INFERENCE  FROM  SEQUENTIAL  DESIGN  IN  A  NONLINEAR  SITUATION 

C.  F.  J.  Wu 

1.  INTRODUCTION 

A  major  difficulty  in  designing  a  nonlinear  experiment  is  that  the 
performance  of  design  depends  on  the  unknown  parameters .  To  utilize  the 
information  fully,  the  experiment  has  to  be  conducted  sequentially.  Die 
choice  of  the  next  design  point  is  determined  by  the  estimate  of  the 
unknown  parameters  based  on  the  observations  made  to  date;  see,  for 
example.  Box  &  Hunter  (1965).  Since  the  data  thus  generated  are  dependent 
and  the  design  points  are  not  repeatable,  it  is  not  clear  whether  the 
repeated  sampling  principle  of  inference  can  be  applied  here.  Similar 
inferential  questions  also  arise  in  other  contexts  (Cox,  1982;  siegmund, 
1980). 

Ford  &  Silvey  (1980)  studied  this  question  in  a  special  example. 

Their  simulation  study  indicates  that  standard  confidence  intervals, 
constructed  by  pretending  that  the  design  points  were  predetermined, 
perform  very  well.  In  $2  we  provide  a  theoretical  justification  of  this 
empirical  finding.  In  §3  we  consider  the  general  problem  of  sequential 
design  and  inference  when  the  parameter  of  interest  is  a  nonlinear  smooth 
function  of  the  linear  parameters  in  a  general  linear  model.  Three  issues 
to  be  studied  are: 

(A)  consistency  of  the  parameter  estimator; 

(B)  asymptotic  validity  of  the  standard  procedures  for  confidence 
region; 

(C)  convergence  of  the  sequential  design,  properly  normalized,  to 


an  optimal  design 


:*.l 


Details  are  in  §3.  The  answer  to  then  is  yes  tinder  quite  weak 
conditions.  Crucial  to  our  investigation  is  a  martingale  structure 
underlying  the  problem.  Issue  (ii)  in  small  samples  was  studied  in  an 
unpublished  manuscript  by  Ford,  Titterington  &  Wu. 


2.  A  SIMPLE  EXAMPLE 

Ford  &  Silvey  (1980)  considered  the  design  problem  for  estimating 
the  nonlinear  function  g(0)  ■  -  0,/(202)  in  the  linear  model 
y  *  Q^u  +  @2u2  +  •  "  ®Ty  +  «»  v  *  (u,u2)T 
where  y  is  an  observed  response  corresponding  to  a  control  variable 
at  level  u,  u  e  [-1,1],  and  e  is  an  independent  N(0,1)  error. 

A 

Take  the  first  two  observations  at  u  m  ±1.  For  r  >  2,  let  ” 

A  A 

(0  „,0  _)  be  the  maximum  likelihood  estimator  of  0  based  on  the 
rl  r2 

*  *  .  - 
first  r  observations,  gr  *  g(0r>»  and  Jr  «  v,v,  +. ..+  vrv*  be 

the  corresponding  information  matrix,  vf  ■  (ur,u^)T.  By  maximization 

A  A  * 

of  the  Gateaux  derivative  at  0r  of  the  Fisher  information  of  g r# 

the  next  design  point  ur+,  is  chosen,  from  [-1,1],  to  maximize 

dr(u)  -  (vTJ~1c«  )2,  c  -  (1,  2g)T  . 

9r  9 

It  turns  out  that  ur+,  must  be  1  or  -1. 

Suppose  that,  among  the  first  n  observations,  sn  are  taken 
at  u  *  -1  and  n  -  sn  at  u  -  1  with  their  means  denoted  by  y_ 
and  y+.  Note  that  sn  is  random.  Ford  &  Silvey  (1980)  showed  that 

y+  *  “  ®2  +  ®1  '  y-  *  ^2  "  ®2  “  ®l  '  (2.1) 


with  probability  1  and 


sn/n  +  n*(-1)  -  1  -  Hg(1)  -  182+6,1/(102+0,1  +  I 02”e1 1 }  '  (2,2) 


•jVvry- 


which  is  the  probability  placed  at  -1  by  the  optimal  continuous 

design  Hq  that  minimizes  c^  M  (h)c^  over  n#  where  M  (rO  is  a 

T 

g-inverse  of  the  normalized  information  matrix  M(n)  ■  E(w  )  with 
r\  being  a  probability  measure  on  u  over  [-1,1].  Note  that  M(n) 

is  singular  if  and  only  if  |g(0)|  ■  -j*  For  g(0)  “  -  -j#  Tlg(-I)  “  1» 

1  * 

for  g(6)  m  "2>  1 )  ■  1.  The  strong  consistency  of  the  maximum 

likelihood  estimator 

®n  ‘  <®nl'®n2)  "  1  *+  +  * J  +  6  (2*3) 

follows  from  (2.1). 

Can  confidence  intervals  for  g  be  constructed  in  the  usual 
manner?  The  answer  is  not  so  obvious  since  the  observations  are 
dependent  as  a  result  of  the  sequential  generation  of  the  design 
points.  Repeated  sampling  of  the  sequential  design  results  in 
different  choices  of  the  design  points  {ur>,  which  makes  the 
distribution  calculus  quite  intractable.  If  the  pretence  were  made 
that  the  design  was  chosen  a  priori/  standard  theory  would  give 
(Ford  &  Silvey,  1980,  (5.2)) 


(2.4) 


as  an  approximate  95%  confidence  interval  for  g.  An  alternative  to 

* 

(2.4)  is  to  replace  J_  by  nM(n.  )  since,  from  (2.2), 

*  9 

J_/n  ♦  M(tu) •  The  two  versions  are  asymptotically  equivalent.  The 
n  0 

latter  was  shown  to  perform  remarkably  well  in  the  empirical  study  of 
Ford  &  Silvey  (1980).  The  empirical  percentage  coverages  of  the  true 
parameter  are  quite  close  to  95%.  A  theoretical  justification  for 
(2.4)  is  now  in  order. 


3- 


From  (2.3)  and  Jn/n  ♦  Ming),  the  asymptotic  validity  of  (2.4) 
can  be  established  via  the  asymptotic  normality  of  the  normalised 
statistic 


/n(gn  -  g) 


-1  T  -  *  1/2 

(28,)  '(c'  M  (TU)cJ 
2  g  o  g 


♦  N(0, 1 )  . 


(2.5) 


We  shall  give  the  proof  separately  for  singular  and  nonsingular 
M(rig). 

First  consider  g(6)  ■  -  -j.  The  treatment  of  g(0)  ■  is 
similar.  Since  ^  -  202  and  $2  ■  0,  the  numerator  of  (2.5)  equals 
/n(y_  -  +  y_)»  which  can  be  approximated  by  '/n  y_(2@2)  1 

*  -i 

via  (2.1).  Since  hg(-1)  -  1«  the  denominator  of  (2.5)  is  (202) 

and  (2.5)  can  be  approximated  by  /n  y_,  whose  asymptotic  normality 

follows  from  the  central  limit  theorem  on  /s  y  and 

n  -- 

* 

sn/n  ♦  nQ(-1 )  ■  1.  Here  we  use  the  fact  that,  given  sn,  the 
observations  taken  at  u  ■  -1  are  independent  and  identically 
distributed. 

For  | gC  ©) |  f  a  more  general  result  will  be  proved.  Note 


that 


9nm  2  (y-  “  y+My-  +  y+)  *  A<y+'  y.) 


is  a  smooth  function  of  y_  and  y+.  Similarly,  g  -  A(  4>1 ,4»2> , 
where  =  Ey+,  ■  Ey_.  From  the  smoothness  of  A  and  (2.1),  the 

A 

asymptotic  distribution  of  /ntg^  “  g)  is  given  by  that  of  its  first 
order  approximation 

A^ ( 4)/n(^+  -  #t)  +  A2(^)/^(7_  -  *2)  ,  (2.6) 

where  A^(^)  and  A2($)  are  the  partial  derivatives  of  A  at  $  ■ 


with  respect  to  $.j  and  $2»  The  denominator  of  (2.5) 

equals 

A*(<f>)/rig(1)  +  A^(  «|>)/riQ(-1 )  .  (2.7) 

Therefore,  (2.5)  would  follow  from  the  asymptotic  normality  of  the 
ratio  of  (2.6)  and  (2.7),  which  is  an  easy  consequence  of 

2  2 

__  _  _  a1  a2 

/n  a.,(y.  -  ♦. )  +  *^n  a,(y  -  $-)  n(o,  —5 -  +  - )  (2.8) 

n6(1)  n0(~1) 

in  distribution  for  any  a ^  and  whose  proof  is  given  in  the 

Appendix. 

It  is  obvious  from  the  arguments  that  the  normality  assumption 
on  e  in  the  linear  model  is  not  essential. 

3.  GENERAL  PROBLEM 

The  above  example  is  simple  and  special  in  that  the  observations 
are  always  taken  at  u  *  *1.  Similar  results  will  be  obtained  in  this 
section  for  a  more  general  problem  under  additional  assumptions. 

We  consider  the  general  linear  model 

y  »  xT0  +  e 

where  6  is  a  p  x  1  vector,  and  the  design  variable  x  can  be 
chosen  anywhere  within  a  bounded  design  region  X.  Assumptions  on  e 
are  given  in  (3.3).  The  q  x  1  vector  parameter  of  interest  is  <J>  ■ 

A 

g(0) ,  which  is  a  nonlinear  smooth  function  of  0.  Let  @n  be  the 
least  squares  estimator  of  0  based  on  the  first  n  observations 

A  A 

(yi,xi).  The  variance-covariance  matrix  of  ♦  ■  g(0R)  *8  aPProxi" 

mated  by  o^g'  ( 0)TMn1g'  ( 0) ,  where  Mp  *  *1*1  +•  •  •  +  xnxn  an<*  9'(0) 
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is  the  derivative  of  g.  The  next  design  point  xn+1  is  chosen 
from  x  e  X  to  minimize 

*(M  +xxT,0  )  -  ♦(g' (0  )^(M  +xxT)_1g' (8  ))  ,  (3. 

n  n  n  n  n 

where  the  "optimality  criterion"  4>  is  a  scalar  function.  For  the 
example  in  $2,  q  *  1  and  $  is  the  identity  map.  Another  choice 
of  xn+j  is  to  minimize  the  Fr&chet  derivative  of  *  at  ^  and 

a 

en  in  the  direction  xxT  (Silvey,  1980),  that  is, 

lim  X”1 [♦{( 1-X)M  +  XxxT,0  }  -  *(M  ,0  )]  .  (3. 

xV  n  n  n  n 

A 

The  next  response  yn+i  is  observed  at  xn+1  and  0r+1  is  defined 
similarly.  Since  the  {yn }  are  dependent,  it  is  not  obvious  that 
standard  results  in  linear  model  theory  still  hold,  Three  major 
issues  to  be  studied  are: 

A  A 

(A)  Consistency  of  ®n:  Does  ®n  +  ®  with  probability  1? 

(B)  Asymptotic  distribution  of  0  :  Does  (8  -0)TM  (0  -0)  ♦  o2/2 

n  n  n  n  p 

distribution? 

Consistency  of  a2:  o2  *  Z(y  -x^0  )2/(n-p)  o2  with 

x  x  n 

probability  1? 

—  1  •  1  * 

(C)  Convergence  of  n  to  an  optimal  design:  Does  n  ^  +  D^, 

*  * 

where  Dfl  *  D(n  )  is  an  optimal  design  minimizing 
♦(g* ( 0 )TD( n ) g* ( 8) )  over  the  normalized  information  matrix 
D(n)  *  /x  xxTu(dx),  /Xn(dx)  =  1? 

A 

Note  that  (A)  implies  the  consistency  of  to  i  and  (B) 

n 

implies  the  asymptotic  validity  of  the  standard  confidence  ellipsoid 
for  0,  where  FQ  is  the  upper  a  point  of  the  F  distribution: 


The  interpretation  of  (C)  will  be  given  for  a  special  case.  Take 
the  optimality  criterion  to  be  the  average  of  the  asymptotic  variances 

A 

of  the  components  of  i|>n,  that  is,  $  in  (3.1)  is  the  trace  or  a 

A 

matrix.  (C)  says  that  the  average  variance  of  4*n  for  the  design 

{x4,...,x  }  is  minimized  as  n  -*■  ». 
i  n 

Questions  (A)  and  (B)  will  be  studied  for  more  general  sequential 
generation  rules.  Let  xn+^  be  an  arbitrary  measurable  function  of 
the  past,  ( x^,y i, . . • ,xn,yn) .  We  assume  that/  for  all  i, 

E(ei  |  e1/...,ei-1)  -  0,  E(e*  |  €^...,6^)  *  a2  <  -  ,  (3.3) 

2 

that  is/  is  a  martingale  difference  sequence  with  variance  o  • 

We  also  assume  that  for  some  6  >  0,  with  probability  1 

{log  ^<n>>1+6/Xmin(n)  0  ,  (3.4) 

where  X  .  (n)  and  X  (n)  are  the  minimum  and  maximum  eigenvalues 
nun  max 

of  the  random  matrix  M^.  Property  (3.4)  implies  *  "•  Under 

* 

(3.3)  -  (3.4),  the  strong  consistency  of  6  to  9  follows  from 

n 

Corollary  3  of  Lai  and  Wei  (1982).  This  answers  (A). 

Before  studying  (B),  we  point  out  an  underlying  martingale 

structure  that  explains  why  standard  asymptotic  results  for  the  fixed 

design  problem  hold  for  the  sequential  design  problem  under  consider- 
*  -1 

ation.  In  0  -  9  ■  M  (x,e,  +...+  x  e  ),  E  x. e .  is  a  martingale 

n  nil  nn  ii 

since  x^  is  a  function  of  the  past  and  is  a  martingale 

difference  sequence.  With  the  imposition  of  the  growth  rate  condition 

A 

(3.4)  on  the  consistency  of  9n  follows  from  a  martingale 

A 

strong  law  of  numbers.  For  the  asymptotic  normality  of  9^,  the 
following  stability  condition  on  the  random  matrix  there  exists 


(3.5) 


a  non-random  positive  definite  matrix  Bp  such  that 

I  and  max  x?B  2x.  0  in  probability, 

P  i  1  U  1 

A  —2 
ensures  that  8^-0  can  be  approximated  by  Bp  E  x^e^,  whose 

asymptotic  normality  follows  from  a  martingale  central  limit 

theorem.  Note  that  the  stability  condition  (3.5)  is  considerably 

weaker  than  the  objective  in  (C)  that  Mp/n  converges  to  an  optimal 

design  matrix. 

A 

Under  (3.3)  and  (3.5),  the  asymptotic  normality  of  8p,  i.e. 

(0  -0)Tm  (0  -0)  ♦  o2 y^»  follows  from  Theorem  3  of  Lai  &  Wei  (1982). 
n  n  n 

*2 

Under  (3.3),  the  strong  consistency  of  0  in  (B)  follows  from  Lemma 

3  of  Lai  &  Wei,  whose  only  regularity  condition,  n^log  1  (n)  +  0 

max 

is  satisfied  since  the  design  region  is  assumed  bounded.  Therefore, 
the  standard  confidence  ellipsoid  for  8  is  asymptotically  valid 
under  (3.5).  The  validity  of  the  confidence  region  for  ♦  obtained 
from  the  confidence  ellipsoid  for  0  by  the  g  transformation  needs 
the  additional  condition  (3.4),  which  ensures  the  consistency  of 

A 

Op.  A  confidence  ellipsoid  for  ij)  can  be  constructed  directly  as 

:  (♦ L-t|»)T[g'(8  )TM i  1g*  ( 0  )]  1  (♦”♦)<*  2  <  q  1(n-p)F  (q,n-p)}  . 
n  n  n  n  n  <x 

Its  asymptotic  validity  can  be  established  from  (A)  and  (B)  under 
(3.3)  -  (3.5)  as  before. 

We  have  answered  questions  (A)  and  (B)  for  very  general  rules 
that  satisfy  (3.3)  -  (3.5),  which  are,  however,  not  easy  to  verify. 

For  the  simple  example  of  $2,  these  conditions  are  either  satisfied  or 
not  required.  For  general  problems  further  discussions  are  given 


later  in  connection  with  (C) 


He  now  consider  question  (C).  If  the  normalized  matrix  n-*!^ 

* 

converges  to  a  nonsingular  optimal  design  matrix  D  ,  it  ensures  that 
the  conditions  (3.4)  -  (3.5),  required  for  (A)  and  (B),  are  satisfied. 
The  updating  of  Mjj  is  governed  by 

(n+1)"1Mn+1  =  { 1— •(  n+1  )_1  }n_1Mn  +<n+1  )-1xn+1x£+1  ,  (3.6) 

where  xR+1  is  chosen  according  to  (3.1)  or  (3.2),  which  depends  on 

A 

the  current  estimate  0  •  In  the  case  where  the  criterion  (3.1)  or 

n 

(3.2)  is  evaluated  at  the  true  parameter  6,  the  algorithm  (3.6)  has 

•  1  * 

been  studied  extensively  and  the  convergence  of  n  ^  to  Dg  was 

established  for  $  «  determinant  (Wynn,  1972;  Pazman,  1974)  and  $  ■> 

* 

trace  (Hu  &  Wynn,  1978),  assuming  that  Dg  is  nonsingular.  By  a 

A 

continuity  argument,  if  6^  converges  to  6  with  probability  1, 

—  1  * 

then  n  7^  in  (3.1)  or  (3.2)  converges  to  Dg  with  probability 

* 

1,  for  the  same  criterion.  Since  Dg  is  assumed  nonsingular,  the 
above  result  does  not  cover  the  example  in  §2  with  |g(6)|  “ 

A 

The  strong  consistency  of  0^ ,  essential  for  the  above  argument, 
depends  on  the  growth  rate  condition  (3.4)  on  the  r andean  matrix 
which  is  not  automatically  satisfied  by  the  selection  rules  (3.1)  or 

(3.2) .  To  ensure  (3.4),  the  rules  have  to  be  modified  so  that  the 

minimum  eigenvalue  of  grows  to  infinity  at  a  rate  no  less  than 

(log  n)  for  some  6  >  0.  That  means,  occasionally,  we  have  to 

switch  from  (3.1)  or  (3.2)  to  a  rule  that  maximizes  the  minimum 
eigenvalue  of  the  augmented  design  matrix.  The  strong  consistency  of 

A 

@n  is  then  guaranteed.  It  would  be  interesting  to  see  if  the 
convergence  results  for  n-1^  cited  above  still  hold  for  the 
modified  rules.  It  would  then  imply  (3.5)  and  the  asymptotic  validity 
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